Search CORE

43 research outputs found

Real-Time Rough Extraction of Foreground Objects in MPEG1,2 Compressed Video

Author: BENOIS-PINEAU J.
MANERBA F.
R. LEONARDI
Publication venue: Ecole Polytechnique Fédérale de Lausanne
Publication date: 01/01/2005
Field of study

This paper describes a new approach to extract foreground objects in MPEG1,2 video streams, in the framework of “rough indexing paradigm”, that is starting from rough data obtained by only partially decoding the compressed stream. In this approach we use both P-frame motion information and I-frame colour information to identify and extract foreground objects. The particularity of our approach with regards to the state of the art methods consists in a robust estimation of camera motion and its use for localisation of real objects and filtering of parasite zones. Secondly, a spatio-temporal filtering of roughly segmented objects at DC resolution is fulfilled using motion trajectory and gaussian-like shape characteristic function. This paradigm results in content description in real time, maintaining a good level of details

Archivio istituzionale della ricerca - Università di Brescia

Evaluation of Explanation Methods of AI -- CNNs in Image Classification Tasks with Reference-based and No-reference Metrics

Author: Benois-Pineau J.
Giot R.
Zhukov A.
Publication venue
Publication date: 21/01/2023
Field of study

The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs' decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM, MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.Comment: Due to a bug found in the code, all tables and figures were redone. The new results did not change the main conclusion, except for the best explainer. FEM has performed better than MLFEM; 25 pages, 16 tables, 16 figures; Submitted to "Advances in Artificial Intelligence and Machine Learning" (ISSN: 2582-9793

arXiv.org e-Print Archive

Three-stream 3D/1D CNN for fine-grained action classification and segmentation in table tennis

Author: Benois-Pineau J.
Martin P.
Morlier J.
Péteri R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/09/2021
Field of study

This paper proposes a fusion method of modalities extracted from videothrough a three-stream network with spatio-temporal and temporal convolutionsfor fine-grained action classification in sport. It is applied to TTStroke-21dataset which consists of untrimmed videos of table tennis games. The goal isto detect and classify table tennis strokes in the videos, the first step of abigger scheme aiming at giving feedback to the players for improving theirperformance. The three modalities are raw RGB data, the computed optical flowand the estimated pose of the player. The network consists of three brancheswith attention blocks. Features are fused at the latest stage of the networkusing bilinear layers. Compared to previous approaches, the use of threemodalities allows faster convergence and better performances on both tasks:classification of strokes with known temporal boundaries and joint segmentationand classification. The pose is also further investigated in order to offerricher feedback to the athletes.<br

arXiv.org e-Print Archive

MPG.PuRe

Clustering of scene repeats for essential rushes preview

Author: Benini Sergio
BENOIS PINEAU J.
Leonardi Riccardo
Mansencal B
Rossi Eliana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper focuses on a specific type of unedited video content, called rushes, which are used for movie editing and usually present a high-level of redundancy. Our goal is to automatically extract a summarized preview, where redundant material is diminished without discarding any important event. To achieve this, rushes content has been first analysed and modeled. Then different clustering techniques on shot key-frames are presented and compared in order to choose the best representative segments to enter the preview. Experiments performed on TRECVID data are evaluated by computing the mutual information between the obtained results and a manually annotated ground-truth

Archivio istituzionale della ricerca - Università di Brescia

Sports video: Fine-grained action detection and classification of table tennis strokes from videos for MediaEval 2021

Author: Benois-Pineau J.
Calandre J.
Mansencal B.
Martin P.
Mascarilla R.
Morlier J.
Publication venue
Publication date: 16/12/2021
Field of study

This paper presents the baseline method proposed for the Sports Video task part of the MediaEval 2021 benchmark. This task proposes a stroke detection and a stroke classification subtasks. This baseline addresses both subtasks. The spatio-temporal CNN architecture and the training process of the model are tailored according to the addressed subtask. The method has the purpose of helping the participants to solve the task and is not meant to reach stateof-the-art performance. Still, for the detection task, the baseline is performing better than the other participants, which stresses the difficulty of such a task

MPG.PuRe

Mumford dendrograms and discrete p-adic symmetries

Author: A. Yu. Khrennikov
B. Dragovich
D. Mumford
F. Kato
F. Murtagh
F. Murtagh
J. Benois-Pineau
J. Tate
L. O. Chekhov
P. E. Bradley
P. E. Bradley
P. E. Bradley
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 09/09/2008
Field of study

In this article, we present an effective encoding of dendrograms by embedding them into the Bruhat-Tits trees associated to

p

-adic number fields. As an application, we show how strings over a finite alphabet can be encoded in cyclotomic extensions of

\mathbb{Q}_p

and discuss

p

-adic DNA encoding. The application leads to fast

p

-adic agglomerative hierarchic algorithms similar to the ones recently used e.g. by A. Khrennikov and others. From the viewpoint of

p

-adic geometry, to encode a dendrogram

X

in a

p

-adic field

K

means to fix a set

S

K

-rational punctures on the

p

-adic projective line

\mathbb{P}^1

. To

\mathbb{P}^1\setminus S

is associated in a natural way a subtree inside the Bruhat-Tits tree which recovers

X

, a method first used by F. Kato in 1999 in the classification of discrete subgroups of

\textrm{PGL}_2(K)

. Next, we show how the

p

-adic moduli space

\mathfrak{M}_{0,n}

\mathbb{P}^1

with

n

punctures can be applied to the study of time series of dendrograms and those symmetries arising from hyperbolic actions on

\mathbb{P}^1

. In this way, we can associate to certain classes of dynamical systems a Mumford curve, i.e. a

p

-adic algebraic curve with totally degenerate reduction modulo

p

. Finally, we indicate some of our results in the study of general discrete actions on

\mathbb{P}^1

, and their relation to

p

-adic Hurwitz spaces.Comment: 14 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

A $p$ -adic RanSaC algorithm for stereo vision using Hensel lifting

Author: D. Nistér
E. Kruppa
F. Murtagh
H. C. Longuet-Higgins
H. Stewénius
J. Benois-Pineau
J. Philip
M. A. Fischler
O. D. Faugeras
P. E. Bradley
P. E. Bradley
P. E. Bradley
Patrick Erik Bradley
R. Hartley
T. S. Huang
Y. Linde
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 03/11/2009
Field of study

p

-adic variation of the Ran(dom) Sa(mple) C(onsensus) method for solving the relative pose problem in stereo vision is developped. From two 2-adically encoded images a random sample of five pairs of corresponding points is taken, and the equations for the essential matrix are solved by lifting solutions modulo 2 to the 2-adic integers. A recently devised

p

-adic hierarchical classification algorithm imitating the known LBG quantisation method classifies the solutions for all the samples after having determined the number of clusters using the known intra-inter validity of clusterings. In the successful case, a cluster ranking will determine the cluster containing a 2-adic approximation to the "true" solution of the problem.Comment: 15 pages; typos removed, abstract changed, computation error remove

arXiv.org e-Print Archive

Crossref

Hierarchical Hidden Markov Model in Detecting Activities of Daily Living in Wearable Videos for Studies of Dementia

Author: A Doherty
B Scholkopf
C Burges
E Kijak
H Amieva
H Bay
J Pinquier
Jean-François Dartigues
Jenny Benois-Pineau
JS Boreczky
Julien Pinquier
L Ballan
LR Rabiner
M Delakis
M Ostendorf
R André-Obrecht
R Hamid
R Poppe
Régine André-Obrecht
Rémi Mégret
S Fine
SP Chatzis
Svebor Karaman
Vladislavs Dovgalecs
Y Ivanov
Yann Gaëstel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/01/2014
Field of study

International audienceThis paper presents a method for indexing activities of daily living in videos obtained from wearable cameras. In the context of dementia diagnosis by doctors, the videos are recorded at patients' houses and later visualized by the medical practitioners. The videos may last up to two hours, therefore a tool for an efficient navigation in terms of activities of interest is crucial for the doctors. The specific recording mode provides video data which are really difficult, being a single sequence shot where strong motion and sharp lighting changes often appear. Our work introduces an automatic motion based segmentation of the video and a video structuring approach in terms of activities by a hierarchical two-level Hidden Markov Model. We define our description space over motion and visual characteristics of video and audio channels. Experiments on real data obtained from the recording at home of several patients show the difficulty of the task and the promising results of our approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

COST292 experimental framework for TRECVID 2006

Author: Aksoy S.
Alatan A.
Avrithis Y.
Benois-Pineau J.
Campbell N.
Dalkilic A.
Doulaverakis C.
Hanjalic A.
Izquierdo E.
Jarina R.
Kompatsiaris I.
Koumoulos G.
Krämer P.
Mezaris V.
Naci U.
Saracoglu A.
Spyrou E.
Vrochidis S.
Zhangk Q.
Ćalić J.
Publication venue: 'National Institute of Standards and Technology (NIST)'
Publication date: 01/01/2006
Field of study

In this paper we give an overview of the four TRECVID tasks submitted by COST292, European network of institutions in the area of semantic multimodal analysis and retrieval of digital video media. Initially, we present shot boundary evaluation method based on results merged using a confidence measure. The two SB detectors user here are presented, one of the Technical University of Delft and one of the LaBRI, University of Bordeaux 1, followed by the description of the merging algorithm. The high-level feature extraction task comprises three separate systems. The first system, developed by the National Technical University of Athens (NTUA) utilises a set of MPEG-7 low-level descriptors and Latent Semantic Analysis to detect the features. The second system, developed by Bilkent University, uses a Bayesian classifier trained with a "bag of subregions" for each keyframe. The third system by the Middle East Technical University (METU) exploits textual information in the video using character recognition methodology. The system submitted to the search task is an interactive retrieval application developed by Queen Mary, University of London, University of Zilina and ITI from Thessaloniki, combining basic retrieval functionalities in various modalities (i.e. visual, audio, textual) with a user interface supporting the submission of queries using any combination of the available retrieval tools and the accumulation of relevant retrieval results over all queries submitted by a single user during a specified time interval. Finally, the rushes task submission comprises a video summarisation and browsing system specifically designed to intuitively and efficiently presents rushes material in video production environment. This system is a result of joint work of University of Bristol, Technical University of Delft and LaBRI, University of Bordeaux 1

Bilkent University Institutional Repository